AITopics | good policy

Collaborating Authors

good policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendix: On the Expressivity of Markov Reward

Neural Information Processing SystemsApr-25-2026, 14:37:29 GMT

We first address questions that might arise in response to the main text. That is, if Alice chooses a SOAP, PO, or TO for Bob to learn to solve, when can Alice determine Bob has solved the task? A: Bob can be said to be doing better on a given task if his behavior improves, as is typical in evaluating behavior under reward. The difference with SOAPs, POs, and TOs is that we measure improvement relative to the task rather than reward. For instance, given a SOAP, we might say that Bob has solved the task once he has found one of the good policies, and we might measure Bob's progress on a task in terms of the distance of his greedy policy to one of the good policies (as done in our learning experiments). The same reasoning applies to POs and TOs: Bob is doing better on a task in so far as his greedy policy (or trajectories) is (are) higher up the ordering. That is, the studied reward functions must be a function of s, (s,a), or (s,a,s0). A: Indeed, as discussed in our introduction, our goal is to examine the expressivity of Markov rewards in the context of finite MDPs.

artificial intelligence, machine learning, reward function, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.96)

Add feedback

Finding good policies in average-reward Markov Decision Processes without prior knowledge

Neural Information Processing SystemsMay-27-2025, 16:01:05 GMT

We revisit the identification of an \varepsilon -optimal policy in average-reward Markov Decision Processes (MDP). In such MDPs, two measures of complexity have appeared in the literature: the diameter, D, and the optimal bias span, H, which satisfy H\leq D . Prior work have studied the complexity of \varepsilon -optimal policy identification only when a generative model is available. In this case, it is known that there exists an MDP with D \simeq H for which the sample complexity to output an \varepsilon -optimal policy is \Omega(SAD/\varepsilon 2) where S and A are the sizes of the state and action spaces. Recently, an algorithm with a sample complexity of order SAH/\varepsilon 2 has been proposed, but it requires the knowledge of H .

average-reward markov decision process, complexity, sample complexity, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)

Add feedback

Good Climate Solutions Need Good Policy--and AI Can Help With That

WIREDMar-1-2024, 08:00:00 GMT

To achieve real climate solutions, changing behavior and developing technology is not enough, says Michal Nachmany, founder and CEO of the environmental nonprofit Climate Policy Radar. "A lot of this is policy," she says. We need better laws, policies, and regulations, as well as needing to hold policymakers and corporates to account, because they're not doing a good enough job, she argues. The problem is that understanding what policies are out there, and what works and what doesn't, is an enormous task. So Climate Policy Radar's goal is to use AI to understand the sprawling climate policy space, to help make sure that future laws and policies are evidence-based.

climate policy radar, good climate solution, good policy, (3 more...)

WIRED

Industry: Energy > Energy Policy (1.00)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Reward Poisoning Attack Against Offline Reinforcement Learning

Xu, Yinglun, Gumaste, Rohan, Singh, Gagandeep

arXiv.org Artificial IntelligenceFeb-14-2024

We study the problem of reward poisoning attacks against general offline reinforcement learning with deep neural networks for function approximation. We consider a black-box threat model where the attacker is completely oblivious to the learning algorithm and its budget is limited by constraining both the amount of corruption at each data point, and the total perturbation. We propose an attack strategy called `policy contrast attack'. The high-level idea is to make some low-performing policies appear as high-performing while making high-performing policies appear as low-performing. To the best of our knowledge, we propose the first black-box reward poisoning attack in the general offline RL setting. We provide theoretical insights on the attack design and empirically show that our attack is efficient against current state-of-the-art offline RL algorithms in different kinds of learning datasets.

algorithm, attacker, dataset, (11 more...)

arXiv.org Artificial Intelligence

2402.09695

Country: North America > United States > Illinois > Champaign County > Champaign (0.04)

Genre: Research Report (0.50)

Industry:

Transportation (0.69)
Information Technology > Security & Privacy (0.67)
Government > Military (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Building a Subspace of Policies for Scalable Continual Learning

Gaya, Jean-Baptiste, Doan, Thang, Caccia, Lucas, Soulier, Laure, Denoyer, Ludovic, Raileanu, Roberta

arXiv.org Artificial IntelligenceMar-2-2023

The ability to continuously acquire new knowledge and skills is crucial for autonomous agents. Existing methods are typically based on either fixed-size models that struggle to learn a large number of diverse behaviors, or growing-size models that scale poorly with the number of tasks. In this work, we aim to strike a better balance between an agent's size and performance by designing a method that grows adaptively depending on the task sequence. We introduce Continual Subspace of Policies (CSP), a new approach that incrementally builds a subspace of policies for training a reinforcement learning agent on a sequence of tasks. The subspace's high expressivity allows CSP to perform well for many different tasks while growing sublinearly with the number of tasks. Our method does not suffer from forgetting and displays positive transfer to new tasks. CSP outperforms a number of popular baselines on a wide range of scenarios from two challenging domains, Brax (locomotion) and Continual World (manipulation).

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2211.10445

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Genre:

Research Report (0.50)
Workflow (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Interpretable Option Discovery using Deep Q-Learning and Variational Autoencoders

Andersen, Per-Arne, Granmo, Ole-Christoffer, Goodwin, Morten

arXiv.org Artificial IntelligenceOct-3-2022

Deep Reinforcement Learning (RL) is unquestionably a robust framework to train autonomous agents in a wide variety of disciplines. However, traditional deep and shallow model-free RL algorithms suffer from low sample efficiency and inadequate generalization for sparse state spaces. The options framework with temporal abstractions is perhaps the most promising method to solve these problems, but it still has noticeable shortcomings. It only guarantees local convergence, and it is challenging to automate initiation and termination conditions, which in practice are commonly hand-crafted. Our proposal, the Deep Variational Q-Network (DVQN), combines deep generative- and reinforcement learning. The algorithm finds good policies from a Gaussian distributed latent-space, which is especially useful for defining options. The DVQN algorithm uses MSE with KL-divergence as regularization, combined with traditional Q-Learning updates. The algorithm learns a latent-space that represents good policies with state clusters for options. We show that the DVQN algorithm is a promising approach for identifying initiation and termination conditions for option-based reinforcement learning. Experiments show that the DVQN algorithm, with automatic initiation and termination, has comparable performance to Rainbow and can maintain stability when trained for extended periods after convergence.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-030-71711-7_11

2210.01231

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Norway (0.04)

Genre: Research Report > Promising Solution (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

On the Expressivity of Markov Reward

Abel, David, Dabney, Will, Harutyunyan, Anna, Ho, Mark K., Littman, Michael L., Precup, Doina, Singh, Satinder

arXiv.org Artificial IntelligenceNov-1-2021

Reward is the driving force for reinforcement-learning agents. This paper is dedicated to understanding the expressivity of reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of "task" that might be desirable: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to optimize tasks of each of these three types, and correctly determine when no such reward function exists. We conclude with an empirical study that corroborates and illustrates our theoretical findings.

constraint, proceedings, reward function, (13 more...)

arXiv.org Artificial Intelligence

2111.00876

Country:

North America > United States > Michigan (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

NeurIPS: Shipra Agrawal on the appeal of reinforcement learning

#artificialintelligenceDec-9-2020, 03:15:23 GMT

As deep neural networks have come to dominate AI, the Conference on Neural Information Processing Systems (NeurIPS) has become the most popular conference in the field. And at the most popular conference in the field, one of the most popular topics is reinforcement learning: at this year's NeurIPS, 95 accepted papers use the term in their titles. "Reinforcement learning is very, very powerful, because you can kind of learn anything, adaptively from the feedback, and by exploring the decision space," says Shipra Agrawal, an Amazon Scholar, an assistant professor in Columbia University's Industrial Engineering and Operations Research Department, and an area chair at NeurIPS, who studies reinforcement learning. "In concept, it's very akin to how humans learn, by trial and error, and how they adapt to what they see -- without requiring a loss function and so on, just by some kind of rewards or positive feedback." In reinforcement learning, an agent explores its environment, trying out different responses to different states of affairs, gradually learning a set of policies that will enable it to maximize some reward.

agrawal, neurips, reinforcement, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement learning is supervised learning on optimized data

AIHubNov-5-2020, 09:52:00 GMT

The two most common perspectives on Reinforcement learning (RL) are optimization and dynamic programming. Methods that compute the gradients of the non-differentiable expected reward objective, such as the REINFORCE trick are commonly grouped into the optimization perspective, whereas methods that employ TD-learning or Q-learning are dynamic programming methods. While these methods have shown considerable success in recent years, these methods are still quite challenging to apply to new problems. In contrast deep supervised learning has been extremely successful and we may hence ask: Can we use supervised learning to perform RL? In this blog post we discuss a mental model for RL, based on the idea that RL can be viewed as doing supervised learning on the "good data".

data distribution, learning, supervised learning, (14 more...)

AIHub

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement learning is supervised learning on optimized data

#artificialintelligenceOct-13-2020, 02:27:37 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback